{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fa5c9ab4",
   "metadata": {},
   "source": [
    "# 曲奇饼案例\n",
    "        假设有两碗曲奇饼，碗 A 包含 30 个香草曲奇饼和 10 个巧克力曲奇，碗 B 中这两种曲奇各 20 个。现在假设你在不看的情况下随机地挑一碗那一块饼，得到了一块香草曲奇饼。\n",
    "        案例的问题是：从碗A取到香草曲奇饼的概率是多少？\n",
    "        P（香草曲奇饼 | 碗 A ）=？\n",
    "        \n",
    "P（香草曲奇饼|碗A）* P(碗A) = P（碗 A | 香草曲奇饼）* P（香草曲奇饼）\n",
    "\n",
    "P（碗 A | 香草曲奇饼） = P（香草曲奇饼|碗A）* P(碗A) /  P（香草曲奇饼） = 3/4* 1/2 / （5/8） = 3/8 * 8/5 = 3/5\n",
    "\n",
    "\n",
    "P（碗A） = 1/2\n",
    "\n",
    "P(碗A | 香草曲奇饼 ） = ？\n",
    "\n",
    "P（香草曲奇饼） = P（香草曲奇饼 |碗B）* P（碗B）+P（香草曲奇饼|碗A）* P（碗A）=1/2 * 1/2+ 3/4 * 1/2 = 2/8 + 3/8= 5/8\n",
    "\n",
    "P（香草曲奇饼|碗A） = 3/4"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb0b3691",
   "metadata": {},
   "source": [
    "定义 Bayes 类，初始化创建一个dict 类型的容器 container——存储贝叶斯各项信息。\n",
    "* key键：存储假设\n",
    "* value值： 存储概率\n",
    "\n",
    "\n",
    "* 定义 Set 方法：给容器添加先验假设及先验概率\n",
    "* 定义 Mult 方法：根据 key 查找到先验概率，并更新概率\n",
    "* 定义 Normalize 方法：归一化\n",
    "* 定义 Prob 方法：返回某一事件的概率"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "5bacde04",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Bayes:\n",
    "    def __init__(self):\n",
    "        self._container = dict()\n",
    "    \n",
    "    def Set(self,hypothis,prob):\n",
    "        self._container[hypothis] = prob\n",
    "    \n",
    "    def Mult(self,hypothis,prob):\n",
    "        old_prob = self._container[hypothis]\n",
    "        self._container[hypothis] = old_prob * prob\n",
    "        \n",
    "        \n",
    "    def Normalize(self):\n",
    "        count = 0\n",
    "        for hypothis in self._container.values():\n",
    "            count += hypothis\n",
    "        for hypothis,prob in self._container.items():\n",
    "            print(self._container[hypothis],count)\n",
    "            self._container[hypothis] = self._container[hypothis] / count\n",
    "            \n",
    "    def Prob(self,hypothis):\n",
    "        return self._container[hypothis]\n",
    "    \n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "60c7f53f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.375 0.625\n",
      "0.25 0.625\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0.6"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 实例化 Bayes\n",
    "bayes = Bayes()\n",
    "\n",
    "bayes.Set('Bow_A',0.5)\n",
    "bayes.Set('Bow_B',0.5)\n",
    "\n",
    "bayes.Mult('Bow_A',3/4)\n",
    "bayes.Mult('Bow_B',1/2)\n",
    "\n",
    "bayes.Normalize()\n",
    "\n",
    "prob = bayes.Prob('Bow_A')\n",
    "prob"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14871b5c",
   "metadata": {},
   "source": [
    "# 单词拼写纠错实战"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "097ec6eb",
   "metadata": {},
   "source": [
    "**max（P(c) * P(w|c) / P(w)）**\n",
    "\n",
    "P(w) = P(c1) * P(w|c1) + P(c2) * P(w|c2)……"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "70e21e67",
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "import string\n",
    "\n",
    "filename = 'data/单词纠错/big.txt'\n",
    "open_r = open(filename,'r')\n",
    "big_info = open_r.read()\n",
    "open_r.close()\n",
    "\n",
    "alphabet = string.ascii_lowercase"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "437f17a1",
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'the': 80030,\n",
       " 'project': 288,\n",
       " 'gutenberg': 263,\n",
       " 'ebook': 87,\n",
       " 'of': 40025,\n",
       " 'adventures': 17,\n",
       " 'sherlock': 101,\n",
       " 'holmes': 467,\n",
       " 'by': 6738,\n",
       " 'sir': 177,\n",
       " 'arthur': 34,\n",
       " 'conan': 4,\n",
       " 'doyle': 5,\n",
       " 'in': 22050,\n",
       " 'our': 1066,\n",
       " 'series': 128,\n",
       " 'copyright': 69,\n",
       " 'laws': 233,\n",
       " 'are': 3630,\n",
       " 'changing': 44,\n",
       " 'all': 4144,\n",
       " 'over': 1282,\n",
       " 'world': 362,\n",
       " 'be': 6155,\n",
       " 'sure': 123,\n",
       " 'to': 28766,\n",
       " 'check': 38,\n",
       " 'for': 6941,\n",
       " 'your': 1279,\n",
       " 'country': 423,\n",
       " 'before': 1363,\n",
       " 'downloading': 5,\n",
       " 'or': 5352,\n",
       " 'redistributing': 7,\n",
       " 'this': 4063,\n",
       " 'any': 1204,\n",
       " 'other': 1502,\n",
       " 'header': 7,\n",
       " 'should': 1297,\n",
       " 'first': 1177,\n",
       " 'thing': 303,\n",
       " 'seen': 444,\n",
       " 'when': 2923,\n",
       " 'viewing': 7,\n",
       " 'file': 22,\n",
       " 'please': 172,\n",
       " 'do': 1503,\n",
       " 'not': 6626,\n",
       " 'remove': 53,\n",
       " 'it': 10681,\n",
       " 'change': 150,\n",
       " 'edit': 4,\n",
       " 'without': 1015,\n",
       " 'written': 117,\n",
       " 'permission': 52,\n",
       " 'read': 219,\n",
       " 'legal': 52,\n",
       " 'small': 527,\n",
       " 'print': 50,\n",
       " 'and': 38313,\n",
       " 'information': 73,\n",
       " 'about': 1497,\n",
       " 'at': 6791,\n",
       " 'bottom': 42,\n",
       " 'included': 43,\n",
       " 'is': 9774,\n",
       " 'important': 285,\n",
       " 'specific': 37,\n",
       " 'rights': 168,\n",
       " 'restrictions': 23,\n",
       " 'how': 1315,\n",
       " 'may': 2551,\n",
       " 'used': 276,\n",
       " 'you': 5622,\n",
       " 'can': 1095,\n",
       " 'also': 778,\n",
       " 'find': 294,\n",
       " 'out': 1987,\n",
       " 'make': 504,\n",
       " 'a': 21155,\n",
       " 'donation': 10,\n",
       " 'get': 468,\n",
       " 'involved': 107,\n",
       " 'welcome': 18,\n",
       " 'free': 421,\n",
       " 'plain': 108,\n",
       " 'vanilla': 6,\n",
       " 'electronic': 58,\n",
       " 'texts': 7,\n",
       " 'ebooks': 54,\n",
       " 'readable': 13,\n",
       " 'both': 529,\n",
       " 'humans': 2,\n",
       " 'computers': 7,\n",
       " 'since': 260,\n",
       " 'these': 1231,\n",
       " 'were': 4289,\n",
       " 'prepared': 138,\n",
       " 'thousands': 93,\n",
       " 'volunteers': 22,\n",
       " 'title': 39,\n",
       " 'author': 29,\n",
       " 'release': 28,\n",
       " 'date': 48,\n",
       " 'march': 135,\n",
       " 'most': 908,\n",
       " 'recently': 30,\n",
       " 'updated': 4,\n",
       " 'november': 41,\n",
       " 'edition': 21,\n",
       " 'language': 61,\n",
       " 'english': 211,\n",
       " 'character': 174,\n",
       " 'set': 325,\n",
       " 'encoding': 5,\n",
       " 'ascii': 11,\n",
       " 'start': 67,\n",
       " 'additional': 30,\n",
       " 'editing': 6,\n",
       " 'jose': 1,\n",
       " 'menendez': 1,\n",
       " 'contents': 50,\n",
       " 'i': 7687,\n",
       " 'scandal': 19,\n",
       " 'bohemia': 15,\n",
       " 'ii': 77,\n",
       " 'red': 288,\n",
       " 'headed': 37,\n",
       " 'league': 53,\n",
       " 'iii': 91,\n",
       " 'case': 438,\n",
       " 'identity': 11,\n",
       " 'iv': 55,\n",
       " 'boscombe': 16,\n",
       " 'valley': 78,\n",
       " 'mystery': 39,\n",
       " 'v': 51,\n",
       " 'five': 279,\n",
       " 'orange': 23,\n",
       " 'pips': 12,\n",
       " 'vi': 37,\n",
       " 'man': 1652,\n",
       " 'with': 9740,\n",
       " 'twisted': 21,\n",
       " 'lip': 56,\n",
       " 'vii': 34,\n",
       " 'adventure': 34,\n",
       " 'blue': 143,\n",
       " 'carbuncle': 17,\n",
       " 'viii': 39,\n",
       " 'speckled': 5,\n",
       " 'band': 54,\n",
       " 'ix': 28,\n",
       " 'engineer': 12,\n",
       " 's': 5636,\n",
       " 'thumb': 51,\n",
       " 'x': 136,\n",
       " 'noble': 48,\n",
       " 'bachelor': 18,\n",
       " 'xi': 28,\n",
       " 'beryl': 4,\n",
       " 'coronet': 29,\n",
       " 'xii': 28,\n",
       " 'copper': 26,\n",
       " 'beeches': 12,\n",
       " 'she': 3946,\n",
       " 'always': 608,\n",
       " 'woman': 325,\n",
       " 'have': 3493,\n",
       " 'seldom': 76,\n",
       " 'heard': 636,\n",
       " 'him': 5230,\n",
       " 'mention': 46,\n",
       " 'her': 5284,\n",
       " 'under': 963,\n",
       " 'name': 262,\n",
       " 'his': 10034,\n",
       " 'eyes': 939,\n",
       " 'eclipses': 2,\n",
       " 'predominates': 3,\n",
       " 'whole': 744,\n",
       " 'sex': 11,\n",
       " 'was': 11410,\n",
       " 'that': 12512,\n",
       " 'he': 12401,\n",
       " 'felt': 697,\n",
       " 'emotion': 36,\n",
       " 'akin': 14,\n",
       " 'love': 484,\n",
       " 'irene': 18,\n",
       " 'adler': 16,\n",
       " 'emotions': 10,\n",
       " 'one': 3371,\n",
       " 'particularly': 174,\n",
       " 'abhorrent': 1,\n",
       " 'cold': 257,\n",
       " 'precise': 13,\n",
       " 'but': 5653,\n",
       " 'admirably': 7,\n",
       " 'balanced': 6,\n",
       " 'mind': 341,\n",
       " 'take': 616,\n",
       " 'perfect': 39,\n",
       " 'reasoning': 41,\n",
       " 'observing': 21,\n",
       " 'machine': 39,\n",
       " 'has': 1603,\n",
       " 'as': 8064,\n",
       " 'lover': 26,\n",
       " 'would': 1953,\n",
       " 'placed': 182,\n",
       " 'himself': 1158,\n",
       " 'false': 64,\n",
       " 'position': 432,\n",
       " 'never': 593,\n",
       " 'spoke': 218,\n",
       " 'softer': 10,\n",
       " 'passions': 29,\n",
       " 'save': 110,\n",
       " 'gibe': 2,\n",
       " 'sneer': 6,\n",
       " 'they': 3938,\n",
       " 'admirable': 14,\n",
       " 'things': 321,\n",
       " 'observer': 13,\n",
       " 'excellent': 62,\n",
       " 'drawing': 240,\n",
       " 'veil': 16,\n",
       " 'from': 5709,\n",
       " 'men': 1145,\n",
       " 'motives': 14,\n",
       " 'actions': 77,\n",
       " 'trained': 23,\n",
       " 'reasoner': 6,\n",
       " 'admit': 65,\n",
       " 'such': 1436,\n",
       " 'intrusions': 1,\n",
       " 'into': 2124,\n",
       " 'own': 785,\n",
       " 'delicate': 54,\n",
       " 'finely': 11,\n",
       " 'adjusted': 16,\n",
       " 'temperament': 5,\n",
       " 'introduce': 23,\n",
       " 'distracting': 1,\n",
       " 'factor': 41,\n",
       " 'which': 4842,\n",
       " 'might': 536,\n",
       " 'throw': 48,\n",
       " 'doubt': 152,\n",
       " 'upon': 1111,\n",
       " 'mental': 37,\n",
       " 'results': 229,\n",
       " 'grit': 1,\n",
       " 'sensitive': 35,\n",
       " 'instrument': 35,\n",
       " 'crack': 20,\n",
       " 'high': 290,\n",
       " 'power': 548,\n",
       " 'lenses': 1,\n",
       " 'more': 1997,\n",
       " 'disturbing': 9,\n",
       " 'than': 1206,\n",
       " 'strong': 168,\n",
       " 'nature': 170,\n",
       " 'yet': 488,\n",
       " 'there': 2972,\n",
       " 'late': 165,\n",
       " 'dubious': 1,\n",
       " 'questionable': 3,\n",
       " 'memory': 55,\n",
       " 'had': 7383,\n",
       " 'little': 1001,\n",
       " 'lately': 22,\n",
       " 'my': 2249,\n",
       " 'marriage': 96,\n",
       " 'drifted': 5,\n",
       " 'us': 684,\n",
       " 'away': 838,\n",
       " 'each': 411,\n",
       " 'complete': 145,\n",
       " 'happiness': 143,\n",
       " 'home': 295,\n",
       " 'centred': 2,\n",
       " 'interests': 118,\n",
       " 'rise': 240,\n",
       " 'up': 2284,\n",
       " 'around': 271,\n",
       " 'who': 3050,\n",
       " 'finds': 23,\n",
       " 'master': 141,\n",
       " 'establishment': 40,\n",
       " 'sufficient': 75,\n",
       " 'absorb': 4,\n",
       " 'attention': 191,\n",
       " 'while': 768,\n",
       " 'loathed': 1,\n",
       " 'every': 650,\n",
       " 'form': 507,\n",
       " 'society': 169,\n",
       " 'bohemian': 8,\n",
       " 'soul': 168,\n",
       " 'remained': 231,\n",
       " 'lodgings': 11,\n",
       " 'baker': 49,\n",
       " 'street': 180,\n",
       " 'buried': 21,\n",
       " 'among': 451,\n",
       " 'old': 1180,\n",
       " 'books': 59,\n",
       " 'alternating': 2,\n",
       " 'week': 95,\n",
       " 'between': 654,\n",
       " 'cocaine': 4,\n",
       " 'ambition': 13,\n",
       " 'drowsiness': 4,\n",
       " 'drug': 21,\n",
       " 'fierce': 12,\n",
       " 'energy': 45,\n",
       " 'keen': 32,\n",
       " 'still': 922,\n",
       " 'ever': 274,\n",
       " 'deeply': 77,\n",
       " 'attracted': 36,\n",
       " 'study': 144,\n",
       " 'crime': 61,\n",
       " 'occupied': 116,\n",
       " 'immense': 77,\n",
       " 'faculties': 8,\n",
       " 'extraordinary': 74,\n",
       " 'powers': 149,\n",
       " 'observation': 39,\n",
       " 'following': 208,\n",
       " 'those': 1201,\n",
       " 'clues': 3,\n",
       " 'clearing': 29,\n",
       " 'mysteries': 9,\n",
       " 'been': 2599,\n",
       " 'abandoned': 72,\n",
       " 'hopeless': 17,\n",
       " 'official': 91,\n",
       " 'police': 94,\n",
       " 'time': 1529,\n",
       " 'some': 1536,\n",
       " 'vague': 39,\n",
       " 'account': 177,\n",
       " 'doings': 11,\n",
       " 'summons': 11,\n",
       " 'odessa': 3,\n",
       " 'trepoff': 1,\n",
       " 'murder': 30,\n",
       " 'singular': 36,\n",
       " 'tragedy': 9,\n",
       " 'atkinson': 1,\n",
       " 'brothers': 50,\n",
       " 'trincomalee': 1,\n",
       " 'finally': 156,\n",
       " 'mission': 34,\n",
       " 'accomplished': 39,\n",
       " 'so': 3017,\n",
       " 'delicately': 3,\n",
       " 'successfully': 25,\n",
       " 'reigning': 3,\n",
       " 'family': 210,\n",
       " 'holland': 12,\n",
       " 'beyond': 225,\n",
       " 'signs': 98,\n",
       " 'activity': 131,\n",
       " 'however': 430,\n",
       " 'merely': 189,\n",
       " 'shared': 25,\n",
       " 'readers': 11,\n",
       " 'daily': 44,\n",
       " 'press': 81,\n",
       " 'knew': 496,\n",
       " 'former': 177,\n",
       " 'friend': 283,\n",
       " 'companion': 81,\n",
       " 'night': 385,\n",
       " 'on': 6643,\n",
       " 'twentieth': 19,\n",
       " 'returning': 68,\n",
       " 'journey': 69,\n",
       " 'patient': 383,\n",
       " 'now': 1697,\n",
       " 'returned': 194,\n",
       " 'civil': 177,\n",
       " 'practice': 95,\n",
       " 'way': 859,\n",
       " 'led': 196,\n",
       " 'me': 1920,\n",
       " 'through': 815,\n",
       " 'passed': 367,\n",
       " 'well': 1198,\n",
       " 'remembered': 120,\n",
       " 'door': 498,\n",
       " 'must': 955,\n",
       " 'associated': 196,\n",
       " 'wooing': 2,\n",
       " 'dark': 181,\n",
       " 'incidents': 14,\n",
       " 'scarlet': 22,\n",
       " 'seized': 114,\n",
       " 'desire': 96,\n",
       " 'see': 1101,\n",
       " 'again': 866,\n",
       " 'know': 1048,\n",
       " 'employing': 7,\n",
       " 'rooms': 86,\n",
       " 'brilliantly': 5,\n",
       " 'lit': 74,\n",
       " 'even': 946,\n",
       " 'looked': 760,\n",
       " 'saw': 599,\n",
       " 'tall': 74,\n",
       " 'spare': 27,\n",
       " 'figure': 103,\n",
       " 'pass': 154,\n",
       " 'twice': 84,\n",
       " 'silhouette': 1,\n",
       " 'against': 660,\n",
       " 'blind': 23,\n",
       " 'pacing': 26,\n",
       " 'room': 960,\n",
       " 'swiftly': 38,\n",
       " 'eagerly': 39,\n",
       " 'head': 725,\n",
       " 'sunk': 27,\n",
       " 'chest': 81,\n",
       " 'hands': 455,\n",
       " 'clasped': 11,\n",
       " 'behind': 401,\n",
       " 'mood': 51,\n",
       " 'habit': 55,\n",
       " 'attitude': 72,\n",
       " 'manner': 135,\n",
       " 'told': 490,\n",
       " 'their': 2955,\n",
       " 'story': 133,\n",
       " 'work': 382,\n",
       " 'risen': 30,\n",
       " 'created': 62,\n",
       " 'dreams': 16,\n",
       " 'hot': 119,\n",
       " 'scent': 17,\n",
       " 'new': 1211,\n",
       " 'problem': 76,\n",
       " 'rang': 29,\n",
       " 'bell': 65,\n",
       " 'shown': 113,\n",
       " 'chamber': 35,\n",
       " 'formerly': 77,\n",
       " 'part': 704,\n",
       " 'effusive': 2,\n",
       " 'glad': 150,\n",
       " 'think': 557,\n",
       " 'hardly': 173,\n",
       " 'word': 298,\n",
       " 'spoken': 92,\n",
       " 'kindly': 86,\n",
       " 'eye': 110,\n",
       " 'waved': 29,\n",
       " 'an': 3423,\n",
       " 'armchair': 49,\n",
       " 'threw': 96,\n",
       " 'across': 222,\n",
       " 'cigars': 7,\n",
       " 'indicated': 88,\n",
       " 'spirit': 167,\n",
       " 'gasogene': 1,\n",
       " 'corner': 128,\n",
       " 'then': 1558,\n",
       " 'stood': 383,\n",
       " 'fire': 274,\n",
       " 'introspective': 3,\n",
       " 'fashion': 49,\n",
       " 'wedlock': 1,\n",
       " 'suits': 8,\n",
       " 'remarked': 169,\n",
       " 'watson': 83,\n",
       " 'put': 435,\n",
       " 'seven': 132,\n",
       " 'half': 318,\n",
       " 'pounds': 26,\n",
       " 'answered': 226,\n",
       " 'indeed': 139,\n",
       " 'thought': 902,\n",
       " 'just': 767,\n",
       " 'trifle': 11,\n",
       " 'fancy': 50,\n",
       " 'observe': 37,\n",
       " 'did': 1875,\n",
       " 'tell': 492,\n",
       " 'intended': 58,\n",
       " 'go': 905,\n",
       " 'harness': 27,\n",
       " 'deduce': 14,\n",
       " 'getting': 92,\n",
       " 'yourself': 162,\n",
       " 'very': 1340,\n",
       " 'wet': 60,\n",
       " 'clumsy': 8,\n",
       " 'careless': 14,\n",
       " 'servant': 46,\n",
       " 'girl': 166,\n",
       " 'dear': 449,\n",
       " 'said': 3464,\n",
       " 'too': 548,\n",
       " 'much': 671,\n",
       " 'certainly': 119,\n",
       " 'burned': 77,\n",
       " 'lived': 113,\n",
       " 'few': 458,\n",
       " 'centuries': 12,\n",
       " 'ago': 108,\n",
       " 'true': 205,\n",
       " 'walk': 75,\n",
       " 'thursday': 7,\n",
       " 'came': 979,\n",
       " 'dreadful': 68,\n",
       " 'mess': 10,\n",
       " 'changed': 134,\n",
       " 'clothes': 62,\n",
       " 't': 1318,\n",
       " 'imagine': 96,\n",
       " 'mary': 705,\n",
       " 'jane': 2,\n",
       " 'incorrigible': 2,\n",
       " 'wife': 367,\n",
       " 'given': 364,\n",
       " 'notice': 98,\n",
       " 'fail': 40,\n",
       " 'chuckled': 7,\n",
       " 'rubbed': 32,\n",
       " 'long': 991,\n",
       " 'nervous': 54,\n",
       " 'together': 260,\n",
       " 'simplicity': 30,\n",
       " 'itself': 273,\n",
       " 'inside': 43,\n",
       " 'left': 834,\n",
       " 'shoe': 11,\n",
       " 'where': 977,\n",
       " 'firelight': 2,\n",
       " 'strikes': 19,\n",
       " 'leather': 35,\n",
       " 'scored': 4,\n",
       " 'six': 176,\n",
       " 'almost': 325,\n",
       " 'parallel': 17,\n",
       " 'cuts': 5,\n",
       " 'obviously': 38,\n",
       " 'caused': 102,\n",
       " 'someone': 160,\n",
       " 'carelessly': 14,\n",
       " 'scraped': 21,\n",
       " 'round': 556,\n",
       " 'edges': 70,\n",
       " 'sole': 70,\n",
       " 'order': 404,\n",
       " 'crusted': 2,\n",
       " 'mud': 36,\n",
       " 'hence': 32,\n",
       " 'double': 49,\n",
       " 'deduction': 12,\n",
       " 'vile': 16,\n",
       " 'weather': 42,\n",
       " 'malignant': 88,\n",
       " 'boot': 22,\n",
       " 'slitting': 2,\n",
       " 'specimen': 14,\n",
       " 'london': 76,\n",
       " 'slavey': 1,\n",
       " 'if': 2373,\n",
       " 'gentleman': 99,\n",
       " 'walks': 10,\n",
       " 'smelling': 5,\n",
       " 'iodoform': 43,\n",
       " 'black': 235,\n",
       " 'mark': 38,\n",
       " 'nitrate': 7,\n",
       " 'silver': 128,\n",
       " 'right': 710,\n",
       " 'forefinger': 7,\n",
       " 'bulge': 2,\n",
       " 'side': 511,\n",
       " 'top': 42,\n",
       " 'hat': 105,\n",
       " 'show': 213,\n",
       " 'secreted': 2,\n",
       " 'stethoscope': 2,\n",
       " 'dull': 74,\n",
       " 'pronounce': 9,\n",
       " 'active': 96,\n",
       " 'member': 50,\n",
       " 'medical': 22,\n",
       " 'profession': 22,\n",
       " 'could': 1700,\n",
       " 'help': 230,\n",
       " 'laughing': 115,\n",
       " 'ease': 44,\n",
       " 'explained': 60,\n",
       " 'process': 219,\n",
       " 'hear': 183,\n",
       " 'give': 523,\n",
       " 'reasons': 64,\n",
       " 'appears': 108,\n",
       " 'ridiculously': 1,\n",
       " 'simple': 139,\n",
       " 'easily': 114,\n",
       " 'myself': 227,\n",
       " 'though': 650,\n",
       " 'successive': 17,\n",
       " 'instance': 50,\n",
       " 'am': 746,\n",
       " 'baffled': 8,\n",
       " 'until': 325,\n",
       " 'explain': 123,\n",
       " 'believe': 183,\n",
       " 'good': 744,\n",
       " 'yours': 46,\n",
       " 'quite': 502,\n",
       " 'lighting': 16,\n",
       " 'cigarette': 6,\n",
       " 'throwing': 46,\n",
       " 'down': 1128,\n",
       " 'distinction': 19,\n",
       " 'clear': 233,\n",
       " 'example': 286,\n",
       " 'frequently': 218,\n",
       " 'steps': 188,\n",
       " 'lead': 137,\n",
       " 'hall': 83,\n",
       " 'often': 443,\n",
       " 'hundreds': 48,\n",
       " 'times': 236,\n",
       " 'many': 609,\n",
       " 'don': 581,\n",
       " 'observed': 131,\n",
       " 'point': 223,\n",
       " 'seventeen': 10,\n",
       " 'because': 630,\n",
       " 'interested': 65,\n",
       " 'problems': 78,\n",
       " 'enough': 175,\n",
       " 'chronicle': 7,\n",
       " 'two': 1138,\n",
       " 'trifling': 12,\n",
       " 'experiences': 11,\n",
       " 'sheet': 29,\n",
       " 'thick': 77,\n",
       " 'pink': 27,\n",
       " 'tinted': 9,\n",
       " 'notepaper': 2,\n",
       " 'lying': 118,\n",
       " 'open': 325,\n",
       " 'table': 296,\n",
       " 'last': 565,\n",
       " 'post': 117,\n",
       " 'aloud': 28,\n",
       " 'note': 115,\n",
       " 'undated': 1,\n",
       " 'either': 293,\n",
       " 'signature': 9,\n",
       " 'address': 76,\n",
       " 'will': 1577,\n",
       " 'call': 197,\n",
       " 'quarter': 46,\n",
       " 'eight': 128,\n",
       " 'o': 257,\n",
       " 'clock': 120,\n",
       " 'desires': 22,\n",
       " 'consult': 19,\n",
       " 'matter': 365,\n",
       " 'deepest': 15,\n",
       " 'moment': 487,\n",
       " 'recent': 54,\n",
       " 'services': 38,\n",
       " 'royal': 111,\n",
       " 'houses': 117,\n",
       " 'europe': 153,\n",
       " 'safely': 11,\n",
       " 'trusted': 16,\n",
       " 'matters': 136,\n",
       " 'importance': 117,\n",
       " 'exaggerated': 28,\n",
       " 'we': 1906,\n",
       " 'quarters': 72,\n",
       " 'received': 280,\n",
       " 'hour': 157,\n",
       " 'amiss': 6,\n",
       " 'visitor': 74,\n",
       " 'wear': 30,\n",
       " 'mask': 12,\n",
       " 'what': 3011,\n",
       " 'means': 253,\n",
       " 'no': 2348,\n",
       " 'data': 17,\n",
       " 'capital': 144,\n",
       " 'mistake': 39,\n",
       " 'theorise': 1,\n",
       " 'insensibly': 2,\n",
       " 'begins': 47,\n",
       " 'twist': 14,\n",
       " 'facts': 72,\n",
       " 'suit': 25,\n",
       " 'theories': 21,\n",
       " 'instead': 137,\n",
       " 'carefully': 72,\n",
       " 'examined': 49,\n",
       " 'writing': 69,\n",
       " 'paper': 177,\n",
       " 'wrote': 149,\n",
       " 'presumably': 8,\n",
       " 'endeavouring': 8,\n",
       " 'imitate': 7,\n",
       " 'processes': 35,\n",
       " 'bought': 55,\n",
       " 'crown': 61,\n",
       " 'packet': 11,\n",
       " 'peculiarly': 14,\n",
       " 'stiff': 20,\n",
       " 'peculiar': 84,\n",
       " 'hold': 114,\n",
       " 'light': 278,\n",
       " 'large': 483,\n",
       " 'e': 136,\n",
       " 'g': 56,\n",
       " 'p': 66,\n",
       " 'woven': 5,\n",
       " 'texture': 6,\n",
       " 'asked': 777,\n",
       " 'maker': 4,\n",
       " 'monogram': 4,\n",
       " 'rather': 219,\n",
       " 'stands': 19,\n",
       " 'gesellschaft': 1,\n",
       " 'german': 196,\n",
       " 'company': 192,\n",
       " 'customary': 19,\n",
       " 'contraction': 61,\n",
       " 'like': 1080,\n",
       " 'co': 30,\n",
       " 'course': 389,\n",
       " 'papier': 1,\n",
       " 'eg': 1,\n",
       " 'let': 506,\n",
       " 'glance': 91,\n",
       " 'continental': 46,\n",
       " 'gazetteer': 1,\n",
       " 'took': 573,\n",
       " 'heavy': 139,\n",
       " 'brown': 71,\n",
       " 'volume': 30,\n",
       " 'shelves': 3,\n",
       " 'eglow': 1,\n",
       " 'eglonitz': 1,\n",
       " 'here': 691,\n",
       " 'egria': 1,\n",
       " 'speaking': 185,\n",
       " 'far': 408,\n",
       " 'carlsbad': 1,\n",
       " 'remarkable': 77,\n",
       " 'being': 918,\n",
       " 'scene': 49,\n",
       " 'death': 330,\n",
       " 'wallenstein': 1,\n",
       " 'its': 1635,\n",
       " 'numerous': 50,\n",
       " 'glass': 116,\n",
       " 'factories': 29,\n",
       " 'mills': 39,\n",
       " 'ha': 75,\n",
       " 'boy': 169,\n",
       " 'sparkled': 5,\n",
       " 'sent': 319,\n",
       " 'great': 792,\n",
       " 'triumphant': 16,\n",
       " 'cloud': 30,\n",
       " 'made': 1007,\n",
       " 'precisely': 24,\n",
       " 'construction': 25,\n",
       " 'sentence': 26,\n",
       " 'frenchman': 102,\n",
       " 'russian': 461,\n",
       " 'uncourteous': 1,\n",
       " 'verbs': 1,\n",
       " 'only': 1873,\n",
       " 'remains': 73,\n",
       " 'therefore': 186,\n",
       " 'discover': 28,\n",
       " 'wanted': 213,\n",
       " 'writes': 20,\n",
       " 'prefers': 2,\n",
       " 'wearing': 87,\n",
       " 'showing': 104,\n",
       " 'face': 1125,\n",
       " 'comes': 91,\n",
       " 'mistaken': 59,\n",
       " 'resolve': 14,\n",
       " 'doubts': 39,\n",
       " 'sharp': 83,\n",
       " 'sound': 219,\n",
       " 'horses': 262,\n",
       " 'hoofs': 24,\n",
       " 'grating': 10,\n",
       " 'wheels': 47,\n",
       " 'curb': 4,\n",
       " 'followed': 329,\n",
       " 'pull': 23,\n",
       " 'whistled': 13,\n",
       " 'pair': 40,\n",
       " 'yes': 688,\n",
       " 'continued': 291,\n",
       " 'glancing': 98,\n",
       " 'window': 186,\n",
       " 'nice': 53,\n",
       " 'brougham': 4,\n",
       " 'beauties': 2,\n",
       " 'hundred': 229,\n",
       " 'fifty': 94,\n",
       " 'guineas': 3,\n",
       " 'apiece': 7,\n",
       " 'money': 326,\n",
       " 'nothing': 646,\n",
       " 'else': 201,\n",
       " 'better': 266,\n",
       " 'bit': 63,\n",
       " 'doctor': 183,\n",
       " 'stay': 74,\n",
       " 'lost': 224,\n",
       " 'boswell': 1,\n",
       " 'promises': 15,\n",
       " 'interesting': 71,\n",
       " 'pity': 75,\n",
       " 'miss': 112,\n",
       " 'client': 33,\n",
       " 'want': 323,\n",
       " 'sit': 89,\n",
       " 'best': 268,\n",
       " 'slow': 65,\n",
       " 'step': 139,\n",
       " 'stairs': 31,\n",
       " 'passage': 110,\n",
       " 'paused': 79,\n",
       " 'immediately': 182,\n",
       " 'outside': 110,\n",
       " 'loud': 64,\n",
       " 'authoritative': 2,\n",
       " 'tap': 10,\n",
       " 'come': 934,\n",
       " 'entered': 282,\n",
       " 'less': 367,\n",
       " 'feet': 179,\n",
       " 'inches': 16,\n",
       " 'height': 36,\n",
       " 'limbs': 67,\n",
       " 'hercules': 4,\n",
       " 'dress': 138,\n",
       " 'rich': 92,\n",
       " 'richness': 2,\n",
       " 'england': 311,\n",
       " 'bad': 155,\n",
       " 'taste': 23,\n",
       " 'bands': 27,\n",
       " 'astrakhan': 1,\n",
       " 'slashed': 3,\n",
       " 'sleeves': 30,\n",
       " 'fronts': 1,\n",
       " 'breasted': 1,\n",
       " 'coat': 172,\n",
       " 'deep': 215,\n",
       " 'cloak': 62,\n",
       " 'thrown': 92,\n",
       " 'shoulders': 125,\n",
       " 'lined': 32,\n",
       " 'flame': 15,\n",
       " 'coloured': 21,\n",
       " 'silk': 50,\n",
       " 'secured': 48,\n",
       " 'neck': 203,\n",
       " 'brooch': 1,\n",
       " 'consisted': 38,\n",
       " 'single': 173,\n",
       " 'flaming': 8,\n",
       " 'boots': 91,\n",
       " 'extended': 75,\n",
       " 'halfway': 19,\n",
       " 'calves': 3,\n",
       " 'trimmed': 8,\n",
       " 'tops': 3,\n",
       " 'fur': 38,\n",
       " 'completed': 25,\n",
       " 'impression': 67,\n",
       " 'barbaric': 2,\n",
       " 'opulence': 3,\n",
       " 'suggested': 69,\n",
       " 'appearance': 135,\n",
       " 'carried': 282,\n",
       " 'broad': 92,\n",
       " 'brimmed': 4,\n",
       " 'hand': 834,\n",
       " 'wore': 58,\n",
       " 'upper': 130,\n",
       " 'extending': 35,\n",
       " 'past': 223,\n",
       " 'cheekbones': 4,\n",
       " 'vizard': 1,\n",
       " 'apparently': 68,\n",
       " 'raised': 212,\n",
       " 'lower': 196,\n",
       " 'appeared': 197,\n",
       " 'hanging': 42,\n",
       " 'straight': 124,\n",
       " 'chin': 30,\n",
       " 'suggestive': 11,\n",
       " 'resolution': 57,\n",
       " 'pushed': 81,\n",
       " 'length': 63,\n",
       " 'obstinacy': 7,\n",
       " 'harsh': 22,\n",
       " 'voice': 462,\n",
       " 'strongly': 41,\n",
       " 'marked': 138,\n",
       " 'accent': 18,\n",
       " 'uncertain': 30,\n",
       " 'pray': 79,\n",
       " 'seat': 170,\n",
       " 'colleague': 7,\n",
       " 'dr': 48,\n",
       " 'occasionally': 89,\n",
       " 'cases': 453,\n",
       " 'whom': 489,\n",
       " 'honour': 16,\n",
       " 'count': 748,\n",
       " 'von': 11,\n",
       " 'kramm': 2,\n",
       " 'nobleman': 11,\n",
       " 'understand': 412,\n",
       " 'discretion': 13,\n",
       " 'trust': 68,\n",
       " 'extreme': 72,\n",
       " 'prefer': 21,\n",
       " 'communicate': 15,\n",
       " 'alone': 337,\n",
       " 'rose': 243,\n",
       " 'caught': 90,\n",
       " 'wrist': 68,\n",
       " 'back': 746,\n",
       " 'chair': 135,\n",
       " 'none': 110,\n",
       " 'say': 755,\n",
       " 'anything': 379,\n",
       " 'shrugged': 35,\n",
       " 'begin': 97,\n",
       " 'binding': 18,\n",
       " 'absolute': 56,\n",
       " 'secrecy': 18,\n",
       " 'years': 571,\n",
       " 'end': 465,\n",
       " 'present': 329,\n",
       " 'weight': 70,\n",
       " 'influence': 138,\n",
       " 'european': 99,\n",
       " 'history': 439,\n",
       " 'promise': 67,\n",
       " 'excuse': 53,\n",
       " 'strange': 220,\n",
       " 'august': 70,\n",
       " 'person': 185,\n",
       " 'employs': 2,\n",
       " 'wishes': 42,\n",
       " 'agent': 25,\n",
       " 'unknown': 87,\n",
       " 'confess': 36,\n",
       " 'once': 569,\n",
       " 'called': 450,\n",
       " 'exactly': 47,\n",
       " 'aware': 52,\n",
       " 'dryly': 5,\n",
       " 'circumstances': 107,\n",
       " 'delicacy': 11,\n",
       " 'precaution': 9,\n",
       " 'taken': 438,\n",
       " 'quench': 3,\n",
       " 'grow': 74,\n",
       " 'seriously': 63,\n",
       " 'compromise': 71,\n",
       " 'families': 45,\n",
       " 'speak': 255,\n",
       " 'plainly': 39,\n",
       " 'implicates': 5,\n",
       " 'house': 661,\n",
       " 'ormstein': 2,\n",
       " 'hereditary': 14,\n",
       " 'kings': 27,\n",
       " 'murmured': 18,\n",
       " 'settling': 16,\n",
       " 'closing': 35,\n",
       " 'glanced': 176,\n",
       " ...}"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 提炼单词\n",
    "def words(info):\n",
    "    return re.findall('[a-z]+',info.lower())\n",
    "# 计算单词出现的次数\n",
    "def wordIndex(wordtext):\n",
    "    words = {}\n",
    "    for word in wordtext:\n",
    "        words[word] = words.get(word,0)+1\n",
    "    return words\n",
    "\n",
    "# 调用方法得到每个单词的使用次数\n",
    "words_map = wordIndex(words(big_info))\n",
    "words_map"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "b9baabd3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 如果存在该单词则返回该单词\n",
    "def known(word):\n",
    "    word1 = []\n",
    "    for word_one in word:\n",
    "        if word_one in words_map:\n",
    "            word1.append(word_one)\n",
    "    return word1\n",
    "\n",
    "# 一次纠正得到一个集合\n",
    "def knomn_edit1(words):\n",
    "    edit1_word = []\n",
    "    n = len(words)\n",
    "    for j in range(n):\n",
    "        word = words[j]\n",
    "        m = len(word)\n",
    "        for i in range(m):\n",
    "            if i != m-1:\n",
    "                edit1_word.append(word[0:i]+word[i+1:])#删除\n",
    "                if i != m-2:\n",
    "                    edit1_word.append(word[0:i]+word[i+1]+word[i]+word[i+2:]) #移动\n",
    "                else:\n",
    "                    edit1_word.append(word[0:i]+word[i+1]+word[i]) #移动\n",
    "                \n",
    "                for ch in alphabet:\n",
    "                    edit1_word.append(word[0:i]+ch + word[i+1:]) # 替换\n",
    "                    edit1_word.append(word[0:i]+ch + word[i:]) #插入\n",
    "                    \n",
    "            else:\n",
    "                edit1_word.append(word[0:(m - 1)])  # 删\n",
    "                for ch in alphabet:\n",
    "                    edit1_word.append(word[0:(m - 1)] + ch)  # 替换\n",
    "                    edit1_word.append(word[0:m] + ch)  # 插入\n",
    "    return edit1_word\n",
    "            \n",
    "                    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "b8520c3a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 二次纠正\n",
    "def know_edit2(word):\n",
    "    words1 = knomn_edit1(word)\n",
    "    words2 = knomn_edit1(words1)\n",
    "    edit2_word = []\n",
    "    for word2 in words2:\n",
    "        if word2 in words_map:\n",
    "            edit2_word.append(word2)\n",
    "    return edit2_word"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "90133d7f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "请输入单词: avd\n",
      "and\n",
      "继续输入？(yes or no )no\n"
     ]
    }
   ],
   "source": [
    "#\n",
    "def correct(word):\n",
    "    canword = known([word]) or known(knomn_edit1([word])) or know_edit2([word])\n",
    "    if len(canword) == 0:#如果两步纠正还是没得到正确单词，那么就默认返回输入的错误单词\n",
    "        canword = [word]\n",
    "        print(canword)\n",
    "    else:\n",
    "        #找到的使用次数最多的单词是\n",
    "        max_info = max(canword, key=lambda w: words_map[w])\n",
    "        print(str(max_info))\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    flag='y'\n",
    "    while flag.lower()=='y':\n",
    "        user_input = input(\"请输入单词: \")\n",
    "        correct(user_input)\n",
    "        flag=input('继续输入？(yes or no )')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c563173d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eacb5189",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "963e592f",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
